NVIDIA/Star-Attention: Efficient LLM Inference over Long Sequences
下载模型
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct
Transformer Llama地址:
/home/cjl/miniconda3/envs/star/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py
启动:
llama-3.1-8b
python run_babilong.py \ -n "llama3.1_8b" \ -p "/home/cjl/.cache/huggingface/hub/models--meta-llama--Llama-3.1-8B-Instruct/snapshots/0e9e39f249a16976918f6564b8830bc894c89659" \ -pc llama3 \ -a star \ -bs 200 \ -l 8000 \ -np 4 \ --num_samples_per_task 1
python run_babilong.py \ -n "llama3.1_8b_babilong" \ -p "/home/cjl/.cache/huggingface/hub/models--meta-llama--Llama-3.1-8B-Instruct/snapshots/0e9e39f249a16976918f6564b8830bc894c89659" \ -pc "llama3" \ -a "star" \ -bs 4096 \ -l 16384 \ -np 4 \ -t qa1
python run_babilong.py \ -n "llama3.1_8b_babilong" \ -p "/home/cjl/.cache/huggingface/hub/models--meta-llama--Llama-3.1-8B-Instruct/snapshots/0e9e39f249a16976918f6564b8830bc894c89659" \ -pc "llama3" \ -a "long" \ -bs 1024\ -l 16384 \ -np 1 \ -t qa5
- llama-3-8b
python run_ruler.py \
-n "llama3_8b" \
-p "/home/cjl/.cache/huggingface/hub/models--gradientai--Llama-3-8B-Instruct-262k/snapshots/5c5269d53cb8e548f753074ce70b0c3ab325dd87" \
-pc llama3 \
-a star \
-bs 16000 \
-l 64000 \
-np 4 \
-t vt \
--num_samples_per_task 1
长文本
python run_ruler.py \
-n "llama3_8b" \
-p "/home/cjl/.cache/huggingface/hub/models--gradientai--Llama-3-8B-Instruct-262k/snapshots/5c5269d53cb8e548f753074ce70b0c3ab325dd87" \
-pc llama3 \
-a star \
-bs 500 \
-l 2000 \
-np 4 \
-t qa_1 \
--num_samples_per_task 1
python run_ruler.py \
-n "input" \
-p "/home/cjl/.cache/huggingface/hub/models--gradientai--Llama-3-8B-Instruct-262k/snapshots/5c5269d53cb8e548f753074ce70b0c3ab325dd87" \
-pc llama3 \
-a star \
-bs 500 \
-l 2000 \
-np 4 \
--num_samples_per_task 2
启动ring(并非原版ring)
python run_ruler.py \
-n "llama3_8b" \
-p "/home/cjl/.cache/huggingface/hub/models--gradientai--Llama-3-8B-Instruct-262k/snapshots/5c5269d53cb8e548f753074ce70b0c3ab325dd87" \
-pc llama3 \
-a ring \
-bs 1024 \
-l 16384 \
-np 4 \
-t vt
创建vt
python /home/cjl/Star-Attention/ruler/data/synthetic/variable_tracking.py --save_dir /home/cjl/Star-Attention/tmp/1 --save_name vt --subset validation --tokenizer_path /home/cjl/.cache/huggingface/hub/models--gradientai--Llama-3-8B-Instruct-262k/snapshots/5c5269d53cb8e548f753074ce70b0c3ab325dd87 --tokenizer_type hf --max_seq_length 16384 --tokens_to_generate 30 --num_samples 5 --random_seed 42 --num_chains 1 --num_hops 4 --context_template "<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>
{task_template}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Memorize and track the chain(s) of variable assignment hidden in the following text.
{context}"
问题1:flash_attn出问题
[已解决] flash-attn报错flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol-CSDN博客
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.3cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
问题2:ruler数据没下载成功
ruler有些依赖没有自动pip install
nltk下载